首页> 外文OA文献 >Data Cleansing Meets Feature Selection: A Supervised Machine Learning Approach
【2h】

Data Cleansing Meets Feature Selection: A Supervised Machine Learning Approach

机译:数据清理满足功能选择:一种受监督的机器学习方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

This paper presents a novel procedure to apply in a sequentialway two data preparation techniques from a different nature such asdata cleansing and feature selection. For the former we have experiencedwith a partial removal of outliers via inter-quartile range whereas forthe latter we have chosen relevant attributes with two widespread featuresubset selectors like CFS (Correlation-based Feature Selection) andCNS (Consistency-based Feature Selection), which are founded on correlationand consistency measures, respectively. Empirical results on sevendifficult binary and multi-class data sets, that is, with a test error rate ofat least a 10%, according to accuracy, with C4.5 or 1-nearest neighbourclassifiers without any kind of prior data pre-processing are outlined.Non-parametric statistical tests assert that the meeting of the aforementionedtwo data preparation strategies using a correlation measure forfeature selection with C4.5 algorithm is significant better, measured withroc measure, than the single application of the data cleansing approach.Last but not least, a weak and not very powerful learner like PARTachieved promising results with the new proposal based on a consistencymeasure and is able to compete with the best configuration of C4.5. Tosum up, bearing in mind the new approach, for roc measure PART classifierwith a consistency metric behaves slightly better than C4.5 and acorrelation measure
机译:本文提出了一种新颖的方法,可以依次应用两种性质不同的数据准备技术,例如数据清洗和特征选择。对于前者,我们经历了通过四分位数间距部分去除离群值的问题,而对于后者,我们选择了具有两个广泛特征的相关属性ubset选择器,例如CFS(基于相关性的特征选择)和CNS(基于一致性的特征选择),分别在相关性和一致性度量上。概述了七个困难的二进制和多类数据集的经验结果,也就是说,根据准确性,使用C4.5或1-最近邻分类器进行测试误差率至少为10%,而无需进行任何类型的先前数据预处理非参数统计测试断言,使用C4.5算法进行特征选择时,使用相关度量进行特征选择的上述两种数据准备策略的满足(用度量测量)比单独使用数据清理方法要好得多。像PART这样的弱者和不是很强大的学习者,通过基于一致性度量的新建议获得了令人鼓舞的结果,并且能够与C4.5的最佳配置竞争。总结一下,牢记新方法,对于具有一致性度量的roc度量PART分类器,其行为要略好于C4.5和相关度量

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号